For this research, we try to answer the following questions:
1a. Does distance from a productive coast influence the intensity of a site’s occupation?
2b. Does distance from a productive coast influence the density of artifact found in a site?
2. Can we use shell density as an indicator of the importance of shellfish in prehistoric foragers’ diet?
At the end of each run, two types of cells export information: cells that were occupied by a site at some point, and cells where shellfish was processed. The exported information is:
1. The kcal of shells, plants, and meat that was brought back to it every day it was occupied (or processed on it)
2. The number of hunting tools that were discarded on it
To answer questions 1a and 1b, we ran linear and third degree polynomial regressions to evaluate the impact of cells’ distance from the coast on their occupation length and on the size of their artifact assemblages. We then categorized each cell by their coastal status (coastal if adjacent to the ocean, and inland elsewhere) and for each run, we compiled the average occupation length and the average number of discarded hunting tools on cells in each habitat (coastal vs. inland cells). To explore the yearly difference in occupation and artifact discard, we compared those averages in individual simulations using boxplots and non-parametric Wilcoxon signed-rank tests. To evaluate how continued occupation of one landscape would change its archaeological signature over longer time periods, we compiled the cells values of all 3200 runs, and here again compared their coastal vs. inland occupation and artifact medians using non-parametric Wilcoxon signed-rank tests.
To answer question 2, we calculated the percentage of the subsistence covered by shellfish in all simulations, then focused only on the cells with >95th percentile of cumulative occupation (the most occupied) and compiled the percentage again. We compared the average of those two percentages using Wilcoxon test again. We also focused on coastal cells among the most occupied cells, and computed the percentage contribution of shellfish on those. We tested the difference between this average and the average of all cells using Wilcoxon test. We then plotted the shellfish calories processed at each cell against the cell’s distance from coast to see if it fitted empirical observations (Jerardino 2016).
Based on the results of the Sensitivity Analyses (see QI_Sensitivity_Analyses.Rmd), the following variables were used to create a representative dataset:
| Variables | Values |
|---|---|
| spatial-foresight | TRUE |
| nrcamps | 5, 15 |
| daily-time-budget | 10,12 |
| hunter-percent | 0.3 |
| vision-forager | 20 |
| forager-movement | local-patch-choice, random |
| nrforagers | 60 |
| days_of_foresight | 5, 10 |
| global-knowledge? | TRUE |
| walk-speed | 2, 3 |
| point-recycling | 25 |
| point-hunting-rate | 50 |
| processing Threshold | 20, 50 |
Each parameter combination was run 50 times. Each run lasted 365 time steps (days). This created a total of 3200 runs, each representing one year.
The following is a map of the covered region, with each biome’s highest productivity.
First, this linear regression considers each cell in each simulation as individual observations (i.e., this is yearly data). This includes all cells that were occupied by a camp at least once in a simulation.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.3 -13.6 -12.1 -7.2 352.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.0500 0.0758 172 <0.0000000000000002 ***
## distCoast 0.8665 0.0074 117 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 55.8 on 683344 degrees of freedom
## Multiple R-squared: 0.0197, Adjusted R-squared: 0.0197
## F-statistic: 1.37e+04 on 1 and 683344 DF, p-value: <0.0000000000000002
The graph shows that there is a lot of variability, which explains the very weak linear regression. It also suggests that the relationship may be non-linear, so we ran a polynomial regression on the dataset to see if it would improve the results. The results are marginally better, but this still suggests that the relationship is not strong between those two variables.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -92.2 -16.1 -10.3 -6.5 355.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.0923 0.0673 254.1 <0.0000000000000002 ***
## poly(distCoast, 3)1 6537.4870 55.6079 117.6 <0.0000000000000002 ***
## poly(distCoast, 3)2 -754.9714 55.6079 -13.6 <0.0000000000000002 ***
## poly(distCoast, 3)3 3729.5058 55.6079 67.1 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 55.6 on 683342 degrees of freedom
## Multiple R-squared: 0.0264, Adjusted R-squared: 0.0264
## F-statistic: 6.17e+03 on 3 and 683342 DF, p-value: <0.0000000000000002
Then running the linear regression on the dataset when aggregated by cell (thus summing the length of occupation and artifact assemblage in all 3200 simulations for each cell). Note that the graph axes are logged to better see the data.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -221 -180 -134 -12 214948
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 222.85 9.95 22.40 <0.0000000000000002 ***
## distCoast -4.83 0.55 -8.78 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1820 on 73837 degrees of freedom
## Multiple R-squared: 0.00104, Adjusted R-squared: 0.00103
## F-statistic: 77 on 1 and 73837 DF, p-value: <0.0000000000000002
The results of the linear regression and the graph suggest that distance to coast does not linearly predict occupation length of a given cell, but that there is still a non-linear relationship, especially when considering the palimpsest created by all runs. A third degree polynomial regression does not improve the fit much, unfortunately, which is due too much variability in the data.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -494 -193 -53 8 214675
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 158.18 6.67 23.7 <0.0000000000000002 ***
## poly(distCoast, 3)1 -15945.75 1811.92 -8.8 <0.0000000000000002 ***
## poly(distCoast, 3)2 27313.97 1811.92 15.1 <0.0000000000000002 ***
## poly(distCoast, 3)3 -23736.66 1811.92 -13.1 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1810 on 73835 degrees of freedom
## Multiple R-squared: 0.00641, Adjusted R-squared: 0.00637
## F-statistic: 159 on 3 and 73835 DF, p-value: <0.0000000000000002
We separated all cells into coastal vs. inland. Coastal cells are the ones with vegetation types 10 to 14 (TMS and Sandy Beach). They are the cells immediately adjacent to the ocean.
Then we compared the length of occupation and assemblage sizes in coastal vs. inland cells.
This first figure is for each site in each individual simulation (yearly dataset). The notches on the sides of the boxplots show the extent of the Confidence Interval around the median. Note that the y axis is on a log scale here for better visibility.
The p-value between the two medians is <0.0000000000000002.
This second figure is for the palimpsest data. Note that the y axis is on a log scale here for better visibility.
The p-value between the two medians is <0.0000000000000002.
The difference is significant at both time scales.
Therefore, the answer to questions 1 is: Distance from the coast is not a great predictor of occupation length as there is a lot of variability in the data, but coastal cells have longer occupations than non-coastal cells, especially at the long time-scale.
The impact of the coast is even more visible when plotting those values on a map. The first map shows the mean of the occupation lengths for each cell in all simulations, whereas the second map shows the sum of occupations. This shows that accumulation (palimpsest) and reoccupation of the same cells on the coast has a strong impact on cells’ length of occupation.
## [1] 366
## [1] 215170
First looking at individual simulations separately (yearly data). This dataset includes only the sites with at least one discarded hunting tool.
The following linear regression shows the impact of distance from the coast on a cell’s assemblage of discarded hunting tools.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.167 -0.082 -0.079 -0.079 3.921
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.079038 0.002262 476.98 < 0.0000000000000002 ***
## distCoast 0.001845 0.000227 8.14 0.00000000000000042 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.317 on 23083 degrees of freedom
## Multiple R-squared: 0.00286, Adjusted R-squared: 0.00282
## F-statistic: 66.3 on 1 and 23083 DF, p-value: 0.000000000000000416
While the graph suggests that there is a slight correlation between the values, the regression shows that it is not a linear relationship. Running a third degree polynomial improves the results slightly.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.169 -0.084 -0.076 -0.076 3.924
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.08612 0.00209 520.19 < 0.0000000000000002 ***
## poly(distCoast, 3)1 2.58282 0.31723 8.14 0.00000000000000041 ***
## poly(distCoast, 3)2 -0.96520 0.31723 -3.04 0.0023 **
## poly(distCoast, 3)3 0.62042 0.31723 1.96 0.0505 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.317 on 23081 degrees of freedom
## Multiple R-squared: 0.00343, Adjusted R-squared: 0.0033
## F-statistic: 26.5 on 3 and 23081 DF, p-value: <0.0000000000000002
The following regression is on palimpsest data (here again, including only cells where at least one hutning tool was discarded). Note that the y axis is on a log scale here for better visibility.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9 -5.3 -3.9 -1.8 522.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.8992 0.3387 20.4 <0.0000000000000002 ***
## distCoast -0.2111 0.0206 -10.3 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.8 on 7400 degrees of freedom
## Multiple R-squared: 0.014, Adjusted R-squared: 0.0139
## F-statistic: 105 on 1 and 7400 DF, p-value: <0.0000000000000002
Here we can see that, while the regression remains pretty weak (R2), the graph clearly shows that the bigger numbers are only in cells close to the ocean. A third degree polynomial regression improves the fit a little bit.
##
## Call:
## lm(formula = fmla, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2 -6.7 -2.7 2.4 516.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.73 0.26 18.2 <0.0000000000000002 ***
## poly(distCoast, 3)1 -233.79 22.37 -10.4 <0.0000000000000002 ***
## poly(distCoast, 3)2 261.27 22.37 11.7 <0.0000000000000002 ***
## poly(distCoast, 3)3 -258.45 22.37 -11.6 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.4 on 7398 degrees of freedom
## Multiple R-squared: 0.0487, Adjusted R-squared: 0.0483
## F-statistic: 126 on 3 and 7398 DF, p-value: <0.0000000000000002
Then we compared the assemblage sizes in coastal vs. inland cells, using the complete dataset of cells with at least one hunting tool. This first figure is for each site in each individual simulation (yearly values). Note that the y axis is on a log scale here for better visibility.
The p-value between the two medians is 0.015. This graph is difficult to read as most values are 1 for both regions.
The second graph is for palimpsest data. Note that the y axis is on a log scale here for better visibility.
The p-value between the two medians is <0.0000000000000002.
This shows more clearly the difference between assemblage size at coastal vs inland sites.
Here again, we can map the mean and summed size of assemblages per cell to see if the coastal vs. inland separation shows up.
## [1] 5
## [1] 520
These maps show clearly that the reoccupation of the same cells has an important impact on the size of the assemblage accumulated on those cells.
It is interesting to see that higher means are found in Sand Fynbos, where most of the hunting takes place, but that sums are higher on the coast.
The answer to question 2 is: Distance from the coast is a weak but significant predictor of the size of artifact assemblages. But, coastal cells do have a bigger assemblage than inland cells.
For each simulation, we calculated the total kcal consumed and the total kcal from each food source. We then used those numbers to calculate the ratio of each food source in the diet.
All cells
The following table show the relative contribution of each food source in all cells and all simulations. This includes all cells (even the ones not occupied by a camp, but where shellfish processing occurred).
## # A tibble: 1 x 4
## meanKcalPerSim percShell percMeat percPlant
## <dbl> <dbl> <dbl> <dbl>
## 1 58252020. 5.32 4.65 90.0
In all simulations, there is an average (mean) of 58252019.5403 calories consumed, and shell accounts on average (mean) for 5.32% of the diet.
Most occupied cells
Then, we focused on the cells that are most occupied the longest and recalculated this average shellfish contribution. Let’s assume that archaeological sites will be found only if they have a certain level of reoccupation. So, focusing on those reoccupied cells (with total length of occupation > 95th percentile), what is the shells kcal contribution we see? 8.704%.
## # A tibble: 1 x 3
## percShell percMeat percPlant
## <dbl> <dbl> <dbl>
## 1 8.70 4.82 86.5
A Wilcoxon test run on the difference between this sample of well-occupied cells and the whole landscape has a p-value of: <0.0000000000000002.
So, when we focus on the most occupied cells, the percentage of shell contribution increases significantly and provides a contribution that does not represent reality. Therefore, this shows that we have the potential to over-estimate the level of shellfish consumed by prehistoric people.
This pattern is even stronger when we focus on the most popular coastal sites only:
## # A tibble: 2 x 4
## coastal percShell percMeat percPlant
## <chr> <dbl> <dbl> <dbl>
## 1 Coastal 14.4 5.02 80.6
## 2 Non-coastal 1.83 4.58 93.6
Comparing the two averages (most reused coastal cells vs all cells) has the following p-value: <0.0000000000000002.
The answer to question 3 is: No, we cannot equate the shellfish density in coastal archaeological sites to their importance in prehistoric people’s diets, because coastal sites that are visited more often contain more shellfish refuse than what is the usual contribution of shellfish in the diet.
We follow this with a few graphs showing the relationship between cells’ distance from coast and the amount of shellfish processed (and eaten, in most cases) at each cells.
We compare the graphs shown above to similar graphs using empirical data from South and West African archaeological sites (compiled by Jerardino 2016). Here is the similar graph based on Jerardino’s table 2.
Out of curiosity, we computed third degree polynomial regressions on this dataset.
##
## Call:
## lm(formula = `MNI/m3 (average)` ~ poly(DistanceToShoreCorr, 3),
## data = jerardino, na.rm = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10971 -3473 -614 2561 18075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10313 1335 7.73 0.00000014 ***
## poly(DistanceToShoreCorr, 3)1 -15468 6794 -2.28 0.0334 *
## poly(DistanceToShoreCorr, 3)2 -4558 6710 -0.68 0.5045
## poly(DistanceToShoreCorr, 3)3 19843 6725 2.95 0.0076 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6660 on 21 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.412, Adjusted R-squared: 0.328
## F-statistic: 4.91 on 3 and 21 DF, p-value: 0.00974
##
## Call:
## lm(formula = `kg/m3 average` ~ poly(DistanceToShoreCorr, 3),
## data = jerardino, na.rm = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -143.2 -79.4 -12.7 71.1 166.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 138.2 19.4 7.13 0.00000037 ***
## poly(DistanceToShoreCorr, 3)1 -206.5 98.8 -2.09 0.048 *
## poly(DistanceToShoreCorr, 3)2 -52.3 98.8 -0.53 0.602
## poly(DistanceToShoreCorr, 3)3 278.1 98.8 2.82 0.010 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 98.8 on 22 degrees of freedom
## Multiple R-squared: 0.364, Adjusted R-squared: 0.277
## F-statistic: 4.19 on 3 and 22 DF, p-value: 0.0173
The results show fair relationships between the distance from coast and shellfish abundance proxies, except that the second relationship is not statistically significant.
THE END